Reinforcement learning in signaling game 



Yilei Hu^, Brian Skyrms^ and Pierre Tarres^ 
January 18, 2013 

Abstract 

We consider a signaling game originally introduced by Skyrms, which models how two 
interacting players learn to signal each other and thus create a common language. The first 
rigorous analysis was done by Argiento, Pemantle, Skyrms and Volkov (2009) with 2 states, 2 
signals and 2 acts. We study the case of Mi states, M2 signals and Mi acts for general Mi, M2 
G N. We prove that the expected payoff increases in average and thus converges a.s., and that 
a limit bipartite graph emerges, such that no signal-state correspondence is associated to both 
a synonym and an informational bottleneck. Finally, we show that any graph correspondence 

with the above property is a limit configuration with positive probability. 

^University of Oxford, Mathematical Institute, 24-29 St Giles, Oxford 0X1 3LB, United Kingdom. 

E-mail: Yilei.Hu@maths.ox.ac.uk. 

^School of Social Sciences, University of California at Irvine, CA 92607. E-mail: bskyrmsQuci . edu 
^CNRS, Universite de Toulouse, Institut de Mathematiques, 118 route de Narbonne, 31062 

Toulouse Ccdcx 9, France. On leave from the Mathematical Institute, University of Oxford. E-mail: 

tarresOmath . univ-toulouse . f r 



1 Introduction 



1.1 Signaling game 

Signaling games aim to provide a theoretical framework for the following basic 
question: how do individuals create a common language? The setting was introduced 
as follows by Philosopher D. Lewis (1969). 

Consider two players, one called Sender, who is regularly given some information that 
the other does not have and seeks to transmit it, and the other called Receiver. 
Sender has a fixed set of signals at his disposal throughout the game, which do 
not have any intrinsic meaning at the very beginning, in the sense that no signal 
is a priori associated to any state; similarly Receiver has a fixed set of possible 
acts. The game is thereafter repeatedly played according to following steps: (1) 
Sender observes a certain state of nature, of which Receiver is not aware. (2) 
Sender chooses a signal and then sends it to Receiver. (3) Receiver observes the 
signal but not the state, and then chooses an act. (4) Both players receive payoffs 
at the end of each round, which are functions of state and act. 

The process involving the above four steps is called one communication. Sender, Re- 
ceiver, states, signals and acts constitute one basic communication system. The game 
lies in the choices of signals and acts by the agents. Note that we do not fix any strategy 



at this point; this is the purpose of Section 1.4 
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1.2 Mathematical definition 



As we will explain at length later on, we adopt a dynamic perspective to investigate 
this game. The outcome of once-play game is not of much interest to us. What we 
wonder is whether player can establish a signaling system, within which each signal is 
uniquely related to a certain state, if they play this game repeatedly. 

In this section, we provide solid mathematical definitions for the objects appearing 



in Section 1.1 and we end with a mathematical definition of repeated signaling game. 



Probability space. Let {Q, J-", P) be a probability space, which is a sufficiently rich 
source of randomness. More specifically, the probability space is at least rich enough 
for all the random variables appearing in this section to live. 

State spaces. Let Si be the set of states, S2 be the set of signals and A be the set of 
acts. 

Players and strategies. We here introduce Nature as a player in this game who 
assigns a state of nature to Sender at each round of the game. Three players. Nature, 
Sender and Receiver respectively generate a random sequence, denoting their strategies 
throughout this repeated game. More specifically, 

(1) Nature generates a sequence {Sn)nm of random variables taking value in the set 
of states Si, each denoting which state Nature assigns to Sender at each round of 
the game. 

(2) Sender generates a sequence {Yn)n&'M of random variables taking in value in the set 
of signals ^2, each denoting which signal Sender sends to Receiver at each round 
of the game. 
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(3) Receiver generates a sequence {Z^jnen of random variables taking value in the set 
of states A, each denoting which state Receiver interpret the signal as at each 
round of the game. 

Payoffs. Let mappings and from 5i x ^2 x ^ to [0, oo) be payoff functions 
for Sender and Receiver at the n-th round of the game. In other words, Sender and 
Receiver gain payoffs ul^{Sn, Yn, Zn) and Yn, Zn) at the end of n-th round of the 

game respectively. 

Information. Let filtration (resp. {F'^)nm) denote the information available 

to Sender (resp. Receiver) before he makes his decision at each round of the game: for 
each n e N, we let 

:= a[Si,Q^i^ n, F^, ^(5^, F,, Z^), ^ i ^ n - l) , 
:= a{Yi,Q^i^n, Z^, (5., y., ^.), q ^ z ^ n - l) . 

Initial settings. The distributions of i.i.d random variables (5'„)neN and the distri- 
butions of Yq and Zq are given at the beginning of the game, which are called priori 
distribution. 

Updating rule for strategies. Let /"^-measurable mapping p\ 

dist.{Yn+i) = pli(^Si,l ^i^n, Yi,ul{Si,Yi, Zi),l ^ i ^ n - ij (1) 
be updating rule of strategies for Sender at n-th round of the game. 
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Let J^^-measurable mapping 



dist.{Zn+i) = pI 



Si,l ^i^n, Zi, uf{Si, Yi,Zi),l 



(2) 



be updating rule of strategies for Receiver at n-th round of the game. Different 
and (p^)neN represent different learning rules. 

Definition 1.1 (Signaling game). A repeated signaling game G is defined as: 



1.3 Questions 

Throughout the paper, we limit our attention to a special, but most common, 

circumstance under which 

(Al) the set of acts matches the set of states of nature by a bijective map; and 

(A2) both players only receive fixed payoffs when the act chosen by Receiver matches 
the state of nature, which is considered as a successful communication; otherwise 
they obtain nothing. 

Under these assumptions, the act can be understood as an interpreted state. Note 
however that the Receiver does not necessarily know the possible states of nature: the 
mutual goal for the two players is to make the communication succeed, but they are not 
always aware that they actually coordinate each other, or even that they are involved 
in a communication game. 

The analysis of choices of strategies by the players gives rise to the following 
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Question 1.1. What game-theoretical equilibrium is most likely to arise in this repeated 
game? 

This issue can either be analyzed from a static perspective with classic equilibrium 
analysis, see Trapa and Nowak (2000), Huttegger (2007) and Pawlowitsch (2008), or in 
a dynamic perspective, through individual learning models or evolutionary strategies, 
see for instance Huttegger and ZoUman (2010). The latter perspective investigates the 
evolutionary pathway out of equilibria: 

Question 1.2. Does the communication system asymptotically reach a stable equilib- 
rium state? If so, what are the good candidates, and how does the communication 

system reach them? 

However, modeling the game through individual learning process is really an issue 
involving lots of factors, for instance the level of rationality of players. In particular, 
we are interested in the following question which is first raised by Skyrms, 

Question 1.3. What is the simplest mechanism to ensure the emergence of a signaling 
system in this repeated game? 

1.4 Reinforcement learning 

In this paper we adopt a dynamic perspective, based on the following individual 
reinforcement learning model. 

(A3) Roth-Erev reinforcement learning rule (or Herrnstein's matching law): the 
probability of choosing an action is proportional to its accumulated rewards. 
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Assumption (A3) actually decide the updating rule for distributions of strategies for 
players. The corresponding behavior is analyzed by Argiento et al. (2009) in the 2-state, 
2-signal, 2-act case, who show that an optimal signaling system emerges eventually, in 
the sense of a one-to-one correspondence between states and signals. We study here 
the case of Mi states, M2 signals and Mi acts for general Mi, M2 G N. 

Note that Roth-Erev reinforcement rule is one of many possible strategies of the 
players, who can have various levels of rationality, each of them leading to a different 
learning process; for instance the myopic and best response models, see Fudenberg and 
Levine (1998). Let us briefly motivate the reinforcement condition, corresponding to a 
low level of rationality. It is natural to believe that individuals with high rationality, 
devoting themselves to the task of establishing a common language, would rapidly 
succeed. It is interesting to study whether on the contrary, under the only assumption 
that these individuals have good memory of their own past experience and aspire to 
a better payoff, a signaling system would also emerge, and how optimal the limiting 
system is. This pertains either to individuals with lower cognitive ability, or who do not 
devote themselves totally to the task of learning the game or take optimal decisions. 



1.5 The model 
1.5.1 Assumptions 

Let G = ({Q,J^,F),Si,S2,A,{Sn,Yn,Zn,ul,ul,pl,pl)nm^ be a signaling game 



as defined in Section 1.2 Apart from assumptions (A1)-(A3), we make further more 



assumptions for our model. One is about the priori distributions and the other is a 
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more detailed version of assumption (A2). 

(A4) States of nature are equiprobable. In other words, for each n E N, Sn is an 
independent uniformly distributed random variable. Furthermore, Yq and Zq are 
independent uniformly distributed random variable. 

(A5) Apart from assumption (A2) on payoffs, we here assume that payoffs are con- 
stants only dependent on types of states and acts. More precisely, m^(<S'„, Yn, Zn) = 
ul{Sn,Yn,Zn) = az^,Yu I{5„=z„}, where a^j is a positive constant, for i e 5i, j E 
S2. 

1.5.2 The model 

Under assumptions (Al) — (A5), we now present the model of the signaling game 
we are going to study in this paper. 

Suppose there are Mi states, M2 signals and Mi acts. Let S — SiL) S2 and, for 
all deN,let S'^ -.^S X ... xS. 

Let (Spair := {{hj) '■ i £ ^i, j £ S2} be the set of strategy pairs. Note that a 
strategy pair (i, j) carries different meanings for the Sender and for the Receiver. For 
the Sender, it means choosing signal j when he observes state i, while for the Receiver 
it means choosing act i when he receives signal j. This strategy pair (i, j) accumulates 
the same payoffs for both the Sender and the Receiver, since the corresponding rewards 
are always received at the same time: let V{n,i,j) denote this accumulated payoff at 
time n. For each n e N, let 
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be the payoff vector at time n. 



Let us describe the random process (Ki)riGN, arising from our model for the game 



and its strategies in Sections 1.1-1.4 



1 Initial setting. For any (i, j) G 5pair, we assume that V^(0,i, j) > is fixed. 

2 Reinforcement learning. At each time step, Sender observes a certain state i 
from set of states iSi; we assume here that all states arise with equal probability 
1/Mi. Then Sender randomly chooses a signal, his probability of drawing j being 

V{n,i,j) 



Receiver observes the signal he receives (let us call it j) and then randomly chooses 
an act k with probability 



3 Updating rule. Both Sender and Receiver receive payoffs when the act chosen 
by Receiver matches the state observed by Sender. For any i E Si, j E S2, 



V{n, + ttij if Sender observes state i and chooses 



V{n + l,i,j) := < 



V{n,i,j) 



signal j, and Receiver chooses act i; 



if else. 



We suppose in this paper that ajj := 1 for all i E Si, j E S2. We also assume for 
simplicity that V{0, = 1, for all i G Si, j G ^2, but the proofs carry on to general 
initial conditions. 
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1.5.3 Symmetrization and key processes 



Let us symmetrize the notation, which will simplify some proofs (in particular that 



of Proposition 4.1): for all i j G S, let 



i,j e Si or i,j e S2 



yin,j,i) jeSi,ieS2 



Now, for all n G N and i & S state or signal, let 

Ty.= J2vin,t,j) 

be its number of successes up to time n. 
For all n G N, let 



i£S i£Si i£S2 



Then T„ — Tq is the total number of successes of the communication system up to time 
n. 

Let 
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For all n G N, let 

be the occupation measure at time n, which takes values in the interior of the simplex 



A := {{xij)ij(zs ■■ Xij ^ 0, ^ Xij = l,x ji = Xij for a\\ i,j e S}. 



Let us define 

dA:= {x e A : 3ieS, s.t. = }, 

which we call boundary throughout the paper, although it is not the topological bound- 
ary of A. One of the technical difficulties in this model is the understanding of the 
behavior of {xn)nm near the boundary, as we shall explain later. 

Given x G A \ 9A, j G 5, let 



Xij 



be the efficiency of the strategy pair (i, j) and let 



k&S * 



be the efficiency Ni{x) ofi. We will justify this notation in Section 4.1 



Note that processes (x„)„gN and (T„)„gN contain all the important information of 
the communication system throughout the game. Therefore, to study how our model 
evolves, we only need to focus on the evolutions of {xn)nm and (T„)„gf^. 
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1.6 Urn setting: Another way to interpret the model 



Note that the reinforcement learning and updating rules 2 and 3 in Section L5 can 
be interpreted in an urn setting. Assume Sender has Mi urns indexed by states, each 
of them having M2 colours of balls, one per signal. Similarly assume that Receiver has 
M2 urns indexed by signals, each of them having Mi colours of balls, one per act (or 
state, since the two sets coincide). 

The model corresponds to the following: Sender picks a ball at random in the 
urn indexed by the state he observes, and sends the signal given by its colour. Then 
receiver picks a ball at random in the urn indexed by this signal, and chooses the act 
given by its colour. Both Sender and Receiver put back the balls they picked and, if 
the act matches the state, add one more ball of the same colour. 



2 Main results 



Given a; G A distribution of strategy pairs of the communication system, we 



introduce in Section 2.1 the bipartite graph of state/signal connections associated to 



X, and its communication potential or efficiency, which measures the corresponding 



expected payoff up to a multiplicative constant. We present in Section |2.2| the main 
results of the paper. 



2.1 Bipartite graph and communication potential 

Definition 2.1. Given x E A, let he the weighted bipartite graph with vertices 
S := SiU S2, adjacency ~ and weights as follows 
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(1) Vi e Si,j e ^2, i ~ J if and only if Xij > 0. 

(2) The weight of edge {i,j} is its efficiency yij = Xij/{xiXj). 

Note that the new adjacency relation is only justified when x is in the topological 
boundary of A; otherwise, is then the complete 2-partite graph with partitions Si 
and S2. 

Definition 2.2. Let H : A — )■ M+ he the function defined by, for all a; G A, 
H{x):= y y 

ie5i,je52:a;ij>0 * i,jeS:xij>0 * 

VKe ca// if(a;) communication potential or efficiency of x. 

Note the communication potential of at time n can be interpreted - up to a 
multiplicative constant- as the expected payoff at that time step: 



p(r„+i - r„ = 1 1 = -^H{xn), (3) 



where F = (^n)neN is the filtration of the past, i.e. J^n '■= (^{xi, . . . ,Xn)- 
Lemma 2.1. H has minimum 1 and maximum min(Mi,M2) on A. 

Proof. Using Cauchy-Schwartz inequality. 
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2 



Xij 
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provides the first inequality, whereas the second one comes from 

H{x)= Yl sup ^ ^ ^ 

similarly H{x) ^ Mi. 
Now 

(a) H{x) reaches the minimum 1 if and only if is a complete graph on which every 
edge shares the same weight 1, as displayed on the figure below. 

State 1 2 3 4 




Signal A B C D 

All edges with the same weight=l 

(b) If Ml ^ M2 (resp. Mi ^ M2), then H{x) reaches the maximum if and only if every 
vertex i & Si (resp. only has one adjacent edge in Q^- In the case Mi ^ M2, 
this corresponds to a unique meaning for every signal, i.e. perfect efficiency, as 
displayed on the figure below. 
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□ 



2.2 Main results 

Definition 2.3. Given a graph Q on SiVJ S2, let {P)g he the following property: 

• if we let Ci, ... Cd he its connected components then, for every i G {1, . . . 
Ci n Si or Ci n ^2 is a singleton. 

• each vertex has a corresponding edge. 

We call synonym (resp. informational hottleneck or polysemy) a state (resp. signal) 
associated to several signals (resp. states or acts), or the corresponding set of adjacent 
signals (resp. states). Obviously Mi 7^ M2 ensures the existence of at least one synonym 
or polysemy. 

Note that, given a; G A, and even if Mi = M2, property (-P)g^ allows for synonyms 
or informational bottlenecks, and does not ensure that the system is optimal as a 
communication system, i.e. that H{x) reaches the maximum of H. Most common 
languages have such flaws. We show, on the figure below, a graph Q corresponding to 
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a sub-optimal communication system: it is easy to check that its normahzed efficiency 
is 80%, i.e. that any x G A such that Qx = G is such that H{x)/ maxH = 0.8. 




Theorem 2.1. The communication potential process {H{xn))nm ^-s ^ bounded sub- 
martingale, and hence converges a.s. 

Theorem 2.2. {xn)nm converges to the set of equilibria of mean-field ODE almost 
surely. 



Remark. We will define the mean-field ODE in Section 3.1 Roughly speaking, the 
mean-field ODE is derived from the dynamics of expected movement of 

Theorem 2.3. For all Q on SiU S2 s.t. {P)g holds, with positive probability 

(a) Xn ^ X s.t. Qx = G- 

(b) Vi, j G S, V {00, i,j) = 00 <^=^ {hj} is an edge of Q . 
2.3 Contents 

The rest of this chapter is devoted to the proof of the main results, as follows. In 
Section [3] we discuss the stochastic approximation of {xn)nm by an ordinary differential 
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equation. In Section |4] we justify some notation from Section 1.5, and show Theorem 



2.1 and its deterministic counterpart that if is a Lyapounov function of the associated 



ODE. In Section [5] we deduce that {xn)nm almost surely converges to the set of equi- 
libria of this differential equation, and describe the stable equilibria x in terms of graph 
structure of Qx- In Section [6] we connect our analysis of stability of the reinforcement 
learning model with the one from the static equilibrium analysis literature. Finally in 
Section , we show Theorem |2.3| about convergence with positive probability towards 
subgraphs Q satisfying {P)g. 



2.4 Notation 

For all ti, u G M, we write u = □(f) if \u\ ^ v; we let u A v = min(-u,f) (resp. 
uM V = max('U,f )) be the minimum (resp. maximum) of u and v. 

We let Cst(ai, 02, ... , a^) denote a positive constant depending only on ai, 02, . . . 
Op, and let Cst denote a universal positive constant. 



3 Stochastic approximation 



3.1 Mean-field ODE 



Let I ■ I be the Euclidean norm on M^^^^^^^. Let us calculate the increment of 
{.Xn)nen at time n: 

_ ( Vn+l Vn , Vn 
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In expectation, 



'^""--"""'^^(K-M-K-Xn). (4) 



E[x„+i - x„| j;] = ^^-q-^y^F(x„), (5) 



where F, defined from A to TA tangent space of A, maps x to 



F{x) := (^Xij(^-^ - H{x) 



with the convention that F{x)ij = if Xij — 0. 

Let us consider the following ordinary differential equation, defined on A \ dA: 

^ = ^(^)- (6) 

3.2 Results 

Lemma 3.1. There exists an adapted martingale increment process (?7„)„gN such that, 
for all n e N; 



Xn+l -Xn= (^l^l- ^J^^ ^M + Vn+1, (7) 



and \r]n+i\ < 2/(1 + r„) 
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Proof. The inequality |?7„.+i| ^ 2/(1 + T„) comes from 



1 



{V„+l-Vn7^0} 



(K+1 - K - Xn) 



1 +T„ 



(8) 



□ 



The following Lemma 3.2 which proves asymptotic linear growth of T„, will in 
particular imply that the martingale {J2k=i Vk)ne'M converges a.s., by Doob convergence 
theorem. 

Lemma 3.2. With Probability 1, 



1 , ,Tn , Tn min(Mi,M2) 
— — ^ lim inf — ^ lim sup — ^ — , 

Ml n-i>oo n n^oo U Mi 



Proof. The result is a direct consequence of ([s]) and Conditional Borel-Cantelli Lemma, 
see in Theorem 1.6] □ 



Remark. We show later that, as n goes to infinity, H{xn) converges (Theorem 2.1), 
and the proof above implies that its limit is also the one of T„/n. 

Formula ([T]) can be interpreted as a stochastically perturbed Cauchy-Euler ap- 
proximation scheme for the ODE ([6]), with step size 1/Mi(l + T„). The step size being 
0(l/ra), {xn)nm asymptotically shadows solutions of the asymptotic ODE, so that its 
limit set belongs to a class of possible limit sets of pseudotrajectories of the ODE (see 
for instance [3j). 

Let r be the set of equilibria of the ODE, i.e. 



r := |x e A : F{x) = o|. 
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4 Lyapunov function 



4.1 Deterministic case: Mean-field ODE 

Let us start with a heuristical justification of tlie fact that H is a Lyapounov 
function, i.e. that it increases along the trajectories of the ODE. 

The differential equation ^ can be understood in the language of (non-linear) 
replicator dynamics, with the following biological perspective. Suppose a population 
consists of species (z, j) E SiX S2, each corresponding to a strategy pair. Let the fitness 
of be its efficiency i/ij = Xij/{xiXj). The wording is justified by the following 
interpretation: the probability that population increases is 

— — ■ — ■ — = — — X proportion of (i,j) x its efficiency. 

Ml Xi Xj Ml 

Then the average fitness of the whole population is the communication potential H{x). 
Therefore the mean-field ODE can be understood as follows, 
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Growth rate of Xij 

= Xij X ( fitness of — average fitness of whole population). 

In particular, those species (i.e. strategy pairs) whose fitness (i.e. efficiency) are above 
the average increase their proportions in the population (i.e. distribution of strategy 
pairs). Note that the fitness of our species changes over time. 

This interpretation makes it reasonable to conjecture that the average fitness of the 
whole population indeed increases along solutions of ODE ([6]), as we show in Proposition 
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4.1 and, in the (discrete) stochastic case, in Theorem 2.1, proved in Section 4.2 The 



proof of the latter is technically quite long, owing to the non-continuity of H on the 



boundary 9A, which prevents us from converting the deterministic Proposition AA_ into 
a stochastic one via a simple Taylor formula. 



Let, for all a; G A, 



p(^) -=2 1^ ^y^i ~ y^^> 

i,j,k£S:Xij,xn.^O 



Proposition 4.1. H is a Lyapunov function on A\ dA for the mean-field ODE 
more precisely, 

Vif = p{x) ^ 0. (9) 

Remark. We will see later that H is not a strict Lyapunov function; in other words, 
Vif • F does not only vanish on the set of rest points of F. 

Proof. We take advantage of the symmetrical notation introduced at the end of Section 



1.5 in a mathematical perspective, there is no difference between state and signal in 
the mean-field ODE. Let us now differentiate H along a path of the ODE: note that, 
when differentiating with respect to space variables, we are obviously not restricted to 
A, so that Xij and Xji are considered as independent variables (in the calculation below, 
we use the convention that Xi = J2jes -^^i' other convention would lead to the 

same result): 



VH-F{x) = y ^(^^-x,,H{c 

(j2^-^-x.,H{x: 



'^{Xi)'^Xj ^ X-iX}^ 
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[X 



13) 



E 



E 



[X 



E 

i,j,k&S 



XjkH{x] 

2/^ \2 



rp rf* ^^^^^ { rp | 2 /yi 



H{x) 



(10) 



Using that 5]lfee5 ^i-^/^* = 1' 



(111 = E^(i-Ef^K) = °- 



. .^r. XiX^ 



Using the symmetry between j and k, we obtain 



(10) 



E 



- E 



(12) 



Lemma 4.1. For any x E A \ dA, 



□ 



Vff ■ F (x) = ^ x^jiy^j - iVi(x))' 



(13) 



which can also be written as 



VH-F{x) = Y,^io{yv-H{x)f -Y,x,{N,{x)-H{x)f 



(14) 



Remark. In the context of communication systems, the above three formulas ([9]), 



(13)-(14) mean that the growth rate of the communication potential is a function 
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depending on the difference between efficiencies of different strategy pairs. 



Proof. Fix i, j & S, and define Y : S ^ Why Y{k) := t/ik, seen as a random vari 
on {S,Fz,i) (witli expectation Kz,i{.)), wliere Fz,i{{k}) := Xik/xi. Tlien 



Xi 



using tliat 



kes 



Therefore 



liJ2x^J{y^J-N^{x)f+ 

\i,jes i,j,ke' 



VH-F{x) = i-\y] x,,{y,, - N,{x)r + > ] "^{m, - N^ix))' 



kes 



J2 Xijivij- Ni{x)f, 
i,jes 



which implies that 



VH . F (x) := J2 ^^Ay^J - H{x)f -Y,x,{N,{x) - H{x))\ 
and completes the proof. 

Lemma 4.2. For all x e A\ dA and i e S, Ni{x) ^ 1. 



Proof. Indeed, 

X 
X 



j&s jes 
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which subsequently imphes, by Cauchy-Schwartz inequahty, that 



\jes J \j&s J \jes J 



(15) 



□ 



Let us define 



A, := {x G A \ : p{x) > e} , 



and let 



A := {x G A : p{x) = 0}, 



where p is defined in the statement of Proposition 4.1 The following Lemma 4.3 is 
straightforward. 

Lemma 4.3. x E A if and only if 

for all i,j,k s.t. Xij ^ 0,Xik 7^ 
or, equivalently, 

Uij = Uik, for all k s.t. Xij ^ 0,Xifc 7^ 0. 



Xj Xf; 



Remark. Lemma |4.3| can be phrased as follows: if x G A then, in the graph Q^, 
edges within the same connected component have the same weight. Note that x G F 
is equivalent to all edges of Qx having the same weight H{x). So the two sets F and 
A are different, i.e. H is not a strict Lyapounov function, which justifies the need to 



prove separately the convergence to the set of equilibria in Section 5.1 
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4.2 Proof of Theorem |2.1| and convergence to A 

4.2.1 Proof of Theorem O 

For simplicity, we let V := Vn in the following calculation. Let us compute the 
expected increment of {H{xn))neN- 

E[H{Xn+l)-H{Xn)\Tn] (16) 



T 7-0 ^ , T ,-0 



^1 VI f _ VI 

+ ^^^^^ + ^^-^^ 

+(\/. + i)(v,- + 1) + {V + m (v, + m) 

,,lre.3 ^^'^^^ + 1)^^-^^ 2 ^^^^^^ i/.i/, + m + 1) 



(17) 
(18) 



25 



Now let us prove (17) is nonnegative. Indeed, 




Next we show that (18) is nonnegative as weU: 



^ = -2 J2 



{ij)6-S2 



E 

(»,i)ecS- 

E 
E 



w,(v; + i)(\/, + i) V V, 



2V- ■ V^- 



2V^- 



1 - 



w,(^^ + i)(v;- + i) V 



1 - 



V,. 



V- V^- 



+ 



1 - 



> 0. 



4.2.2 Convergence to A 



Let us now prove the following Proposition 4.2 



Proposition 4.2. (x„)„gN converges to A a.s. ; more precisely, (p(x„))„gN converges 
to a.s. 



Proof. Let us define process F„ := H{xn), n G N. We decompose into a martin- 

gale {Mn)neN and a predictable process {An)neN where - A„ = E[F„+i - F„ | J'n]. 
Since if is bounded, martingale {Mn)n£n is upper bounded and hence converges. 
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Let 



1 ^ VWt f V": v." ^ ^ 



Hence 



The rest of the argument is similar to the proof of convergence to the set of 
equihbria in [1]. If {xn)n& were infinitely often away from A, then the drift would 
cause {H{xn))neN to go to infinity, hence contradicting the boundedness of H. Indeed, 
let S be the distance between and the complement of Ag/2- Suppose Xn G A^, 
Xn+i, Xn+k-1 e Ai \ Ae and Xn+k e Ai, 



An^k-A„ ^ Y: (Pr + Qr) > E^^= E^^iE 

Therefore, 



r=n r=n r=n r=n 



n+k—1 n+k—1 



^ ^ l^^r+l^^rl ^ ^ j_T ^ ~ {A-n+k ~ An) ■ 



r=l ^=1 



Therefore a;„ e A^ infinitely often would cause A„ to increase to infinity. By contra- 
diction, {xn)nen must converge to A. □ 

Remcirk. The proof of convergence of to would also hold on (deterministic) 

solutions of the ODE. 
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Corollary 4.1. Almost surely, 



> lim — ^ — - 

n n-5>oo Ml 



1 min(Mi,M2) 
Ml' M^ 



as n — J- oo. 



Proof. Same as Lemma 3.2 



□ 



5 Equilibria 



5.1 Convergence to the set of equilibria : Proof of Theorem 



2.2 



We already know that the occupation measure (x„)„gN a.s. converges to A; the 
goal of this section is to prove that, more precisely, (x„)„gN a-s. converges to the set of 
equilibria F of the ODE. 

Lemma 5.1. Suppose e small enough and x G A^4. Then for any i G Si, j G S2 s.t. 



Vij - Ni{x) I < ^ , and \ yij - Nj{x) \ < | 



Proof. Follows directly from (13). □ 
Lemma 5.2. 



n+l 

~ M 



\f ^0 [fio - ^^(^") - + E{xS) + r„+i + (19) 
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where (r„)„^i is predictable, IE[Q'^ l-^n] = and 



1 2 

\C^+^ < 



K+i\ < 



6(x^ + xp 12 



Proof. 



T/ra+lT^n+l 



» 3 



i- 



- 1 



and 



(|2ii) = T^vi'v^ iiAy"+i>o + v;^v;"v;" Iat„+,>o + ^r^" Iak'^+^ 



Hence 



1(21)1 ^ GT^V^'K 



and 



E 



(21) 



T 

'J 11. 



MiTr 



By the following simple estimate 



V^VJ" 1 1 

- 1 ^ XT + 



T/n+l T/"+l ' 
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we deduce that 



and 



Let 



Fn+ll 



E 



^ ^ ^ — - — ^ 



(21) X (22) ^ 



Therefore 



12 



TnX^ Xj 



I X G A : Xij ^ e OT yij — H{X) ^ — e |. 



Lemma 5.3. Assume e > zs small enough and rriQ G N large enough. Let 



m=mo 



6m/ 



m=mo 



Then 



(a) (i?„)„gN ('resp. (S'„)n,eNy' a suhmartingale (resp. supermartingale) . 
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(b) lim SUp„^^ (-R„ Rm) — ^^™S^Vn^m,m^ooi^n Sm)~^ — 0- 

Proof. First we note that if x„ ^ Uij{e), 



This imphes that {Sn)neN is a supermartingale. Now we prove that (-R„)neN is a sub- 
martingale. 



Assume e > small enough and x„ ^ Uij{e) U Ae4. Then Lemma 5.1 implies that 



Hence, 



X e 12 



MiTnX^x"} V 3 T^x^Xj" 

^ ^ — , 

QMiTnX'^x] Qn 

if n ^ 144Mi/e^ (which implies Tn ^ and therefore T^x^x^ ^ 72/(x7je) 

Let us now prove (6). Let 

n 

m=mo 
n 

• = Sn — ^ ^ E [ ~ »S'm- 1 I J' m- 1 ] • 



m=mo 
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By (20), we note that for all n ^ mg, 

12 48M2 

Therefore (n„)„gN is bounded in and hence converges. We can obtain similar bounds 
for which converges as well. This completes the proof. □ 

Lemma 5.4. Let e > 0, and assume n G N zs sufficiently large (depending on e). If 
Xn e Uij{e), \H{xn+i) - H{xn)\ < e/2 and r„ ^ n/{2Mi), then x„+i G Uij{2e). 

Proof Let Xn G Uij{e). Then (jsj) implies 



,+1 „, . 2Mi 



n 



Assume that n is large enough. If x^- ^ e, then x^^^ ^ 2e. Otherwise x^- > e and 



y^j — H{xn) ^ — e; assuming T„ ^ n/i2Mi) and using (|23|), we have 



and, by assumption \H{xn+i) — H{xn)\ < e/2, so that we conclude that y""*"^ — 



H{xn+i) ^ -2e. 



□ 



Lemma 5.5. 



limsupa;r,(2/i;-if(x„))- = 0. 
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Proof. We fix e > and mo G N, and let r^^ be the stopping time 

r ri 6 ^ 

r^o := inf |n ^ mo : x„ ^ A,4 or < or |if(x„) - i/(xmo)l > 4/- 

We only need to show that almost surely, either r^o < oo or x„ G Uij{3e) for all n 
large enough. This will complete the proof since we know from Theorem |2.1[ Lemma 



3.2 and Proposition 4.2 that there exists almost surely mo G N s.t. = oo. 



Let cr^g be the stopping time 

■= inf{n ^ mo : x„ e f/ij(e)}. 



Lemma 5.3[ &) implies that there exists a.s. a (random) mo G N such that, for all 



n ^ m ^ mo. 



(Rn-Rm)-^^, {Sn-Sj^^^. (24) 



Therefore < cxd or < using J2n^mo 1/^ = oo and the observation that x^- 
is bounded. 



For all n G [cTmojTmo); let Pn be the largest k ^ n such that G Uij{e). By (24) 



NowXp^+i G Uij(2e) (by Lemma 5.4): let us assume for instance that yf"^"*^— if (Xp^+i) ^ 



-2e. Together with \H{xn) — ii(xp„+i)| ^ e/2 (by n < Tmo), deduce that 



2/1}. - Hixn) ^ ytf' - i/(xp„+i) - e ^ -3e. 
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With a similar argument, xlj^ ^ 2e implies ^ 3e. So overall, x„ G f/jj(3e) if 
o"mo ^ "^^ < 'i^moj which enables us to conclude. 

□ 

Proof of Theorem 12.21 

2H{x^) = Vl^l 

Hence, lim„^oo a;"^ (y^" - H{xn)y = implies 

hm x'l^{y:^-H{xn))=0. 
Lemma [5.51 enables us to conclude. 



5.2 Bipartite graph structure 



Let us recall the bipartite graph defined in Section [3] (see Definition 2.1): any 
a; G A is associated with a weighted bipartite graph with vertices S := Si U S2, 
adjacency ~ and weights as follows 

(1) Vz G Si,j G ^2, i ~ j if and only if Xij > 0. 

(2) The weight of edge {i,j} is Xij/{xiXj). 



Let Ci, . . ., Cd be the connected components of Qx- Besides the bipartite graph 
defined above, let us also discuss two other possible ways to assign weights: 
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Graph : Edge Cij has weight Xij/xj, i G iSi, j G 1S2. 
Graph Q"^ : Edge Cij has weight Xij/xi, i E Si, j E S2- 



By Lemma 4.3, we observe some interesting properties of these three graphs when 



X G A, in particular x G F: 

On '■ AU edges in a component Ck, k = 1, . . . ,d have the same weight A^. Hence, 



d d 



Hix) = ^ ^'^^ = 5Z 5Z 

fc=i ie^inCfc fc=i ie52nCfe 



Furthermore, if x is an equihbrium, aU the edges in Q^^ have the same weight, 
which equals H{x). 

On Q^. : Every edge linked with the same state i has the same weight, which we denote 
by ki. Also for each signal j, the sum of weights of edges linked to j is equal to 1, 
so that H{x) = Yli=i ^i- 

On Q"^ : Every edge linked with the same signal j has the same weight, which we 
denote by k'j. Also for each state i, the sum of the weights of edges linked to i is 
equal to 1. H{x) = ^'j- 

5.3 Properties of Lyapounov function 

We now show that H is constant on each connected component of F. Since it is not 



continuous on the boundary, we first prove in Lemma 5.6 that it takes a constant value 



on connected subsets of F with the same support (defined below) by a differentiability 



argument, and then conclude in Proposition |5. 1| by a continuity argument on the set 
of equilibria. 



35 



Let 



For any a; G A, we define its support 

Sx {{ij) ■ ieSi, j e S2, Xij > 0}. 

© can be used as an index set to divide A into several parts in the following sense: for 
any 6 eO, 

Ae-.^ixeA: S,^ 6}, 
Fg := Ae n F. 

Lemma 5.6. For any 9 & Q, H is constant on each component ofFg. 

Proof. Given q & Fq, let us differentiate H at q with respect to Xij = Xji, € Sg 
without the constraint x & A: 




= 0. 
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The penultimate equality comes from the fact that qij/{qiqj) — H{q) if & Sg, q & 

r. □ 

Proposition 5.1. H is constant on each connected component ofV. 

Proof. Let us show that H is continuous on F, which will enable us to conclude. Indeed, 
suppose that g e F, and that a; e F is in the neighbourhood of g e F within A, then 
Sx 2 Sq and, using x e F, 



1 



so that 



H{x) = — ^^^^ y ^ 



and the conclusion follows. □ 

5.4 Classification of equilibria and stability 

5.4.1 Jacobian matrix 

At any equilibrium x e (A \ ()A) n F (F is not differentiable on (?A), we calculate 
the Jacobian matrix 



J. 



X 



where, by a slight abuse of notation, 

F{x) = {Fij{x))^ij)^s^^.^. 
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For all {l,k) G iSpair, a simple extension of the calculation in the proof of 

Lemma 



5.6 



yields ^(x) = - I{^^^=o}2i^(a;), so that 



TT— = ^{{ij)={l,k),x,,=0}{yik - H{X)) + Xjfc— - xik^ix) 

Ky tXj i^j (_/i-t>2j \y tXj r^j 



= H{x) I{(i,j)=(/,fc)}( ^{x.j^o} - %,,=o}) 



X7, 



Therefore, for any G 5pair s.t. x^- 7^ 0, 



(9K 



OXij 



H{x) 1 - 



dx 



OXij 



H{x) 
H{x) 



Xi 
Xlj 



X 



OF, 



Ik 



OXij 



0, leSi\ {t}, keS2\ {j}; 



(25) 
(26) 
(27) 
(28) 



for any G 5pair s.t. Xij = 0, 



dF, 



Oxij 
dFiu 



dx 



-H{x); 

0, I e Si,k e S2,il,k) ^ {i,j),xik = 0. 



(29) 
(30) 



Let Ci, . . . ,Cci he the connected components of the edges of Q^- Let 



OF 



Ik 



dxj 



V / ii,j),(Lk)ec„ 
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Therefore, using (25)-(30), can be written as follows, by putting first (i, j) and (/, k) 
coordinates such that Xij ^ and xik 7^ (in the same order, with increasing connected 
components Ci, . . . , Cd) 



( 



\ 



(0) 



Ji 



-H(x) 



(*) 



-H(x) 



5.4.2 Classification of equilibria based on stability 

Let us introduce a few definitions of stability for ordinary differential equations. 

Definition 5.1. x is Lyapounov stable if for any neighborhood Ui of x, there exists a 
neighborhood U2 ^ Ui of x such that any solution x{t) starting in U2 is such that x{t) 
remains in U2 for all t ^ 0. 

Definition 5.2. x is asymptotically stable if it is Lyapounov stable and there exists a 
neighbourhood Ui such that any solution x{t) starting in Ui is such that x{t) converges 
to X. 



An equilibrium that is Lyapunov stable but not asymptotically stable is called 
neutrally stable sometimes. 
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Definition 5.3. x is linearly stable if all eigenvalues of the Jacobian matrix at x have 
nonpositive real part; otherwise, x is called linearly unstable. 

Remark that, with these definitions, linear stability allows for eigenvalues to have 
zero real part, and therefore does not necessarily imply Lyapounov stability. However 
the dynamics considered here makes these stable equilibria indeed Lyapounov stable 
when they do not lie outside the boundary of A are studying here, as it can be shown 
by the help of an entropy function; Section [7| on convergence with positive probability 
to stable configurations can be understood as a consequence of this propoerty in the 
nondeterministic case. 

Definition 5.4. Let 

To := r n A \ (9A, 
Tb := r n dA, 

and let Tg (resp. T^) he the set of linearly stable (resp. unstable) equilibria in Tq for 
the mean-field ODE. 

For any a; G F^, let 

8.^ := {e e . _ 1 and 3 {i,j) G s.t. 6 ■ e,j > 0}. 

Proposition 5.2. We have 

(a) F, = {x G Fo : (P)g, holds}. 

(b) If X & Tu, then there exists an eigenvector in Sx whose eigenvalue has positive real 
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part. 



(c) For all x G and 6 & Sx \ {0}, there exists a neighbourhood N'{x) of x such that, 
if Xn G M{x), then 



where r]n+i is the martingale increment defined in 

Note that (b)-(c) will be used to that {xn)neN stochastically "perturbs enough" 
the ODE ([6]), in order to prove nonconvergence to unstable equilibria. 



To prove Proposition 5.2, we need following Lemma 5.7 on the structure of Qx 



when X eTq, and the elementary Lemma 5.8 



Lemma 5.7. For any x G Fq such thatVg^ does not hold, Qx has at least one connected 
component on which every vertex has at least two edges. 

Proof. First, x G F^j implies that there exists a connected component C with at least 
two states and two signals. Assume, by contradiction, that signal j is in C and j is only 
linked to one state, for instance state i. Then Xij/xj = 1, and for all k G ^2, s.t. Xik > 0, 

Xf^ Xj 

i.e. k is only linked with one edge. It implies C H Si = {i}, which contradicts our 
assumption. □ 

Lemma 5.8. // a random variable cu satisfies that E[a;] < oo, P(a; = a) ^ p and 
F{uj = b) ^ p, then 

Var(u;) ^ ^—j^ ■ 
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Proof. 



Var(w) ^ p{{a - E[u])^ V (6 - E[u]f) 



[h — a^p 



□ 



Proof of Proposition 5.2[ (a)-(b) Suppose that x gTq and that Vg^ does not hold, 



and let us prove that x G Tu. Lemma p.7| implies that has a connected component 
on which every vertex has at least two edges, which we assume to be Ci w.l.o.g. Let 
V{Ci) (resp. E{Ci)) be its set of vertices (resp. edges). Let us show that has at 
least one eigenvalue with positive real part. 

Indeed, let us compute the trace of J^: 



Tr{Jl)=Hix) -f^) 



H{x)mCi)\~\V{C,)\) ^ 0. 



The last inequality comes from the fact that the number of edges is greater than or 
equal to the number of vertices in Ci because every vertex has at least two edges in Ci. 

Now it is easy to check that = —H{x) II where 1 = (1, . . . , 1)-^, and therefore 

that ~H{x) is an eigenvalue of J^, which enables us to conclude (b) and the first part 
of (a). 

Now suppose that x e Fq, and that Vg^ holds. Then each component of Qx has 
only one state or only one signal. Let us assume for instance that Ci consists of states 
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1, . . . ,k and signal A. Then equals 



Xi X2 ... Xk 



H{x) 
xa 



Xi X2 



^ Xi X2 ... Xk j 



The rank of J], is 1, and its eigenvalues are and —H{x), which completes the proof. 



Let 



(c) Let X G Fu and 9 E £x \ {0}. Let {ejj} be a orthogonal basis set in 



Wn+l ■■= I|V„+i-y„|=l(K+l -Vn- X„) = (1 + T„)(X„+1 - Xn). 



t,MixM2 



We note that 



Wn+i -9 = 0, with probability 1 



H(Xr, 



Ml 



Wi G Si,j G ^2, Wn+i ■ 9 = {1 — x^,)eij ■ 9 , with probability 



Ml 



Note that a; G implies that H{x) ^ Mi and that, for all G Sx, Xij ^ 1. 



Therefore, assuming that x„ is in the neighbourhood of x (for which = H{x)), 



Lemma 5.8 implies 



Var(W„+i ■9\J^n) ^ . max <^ mm 1 -— , -j^ 



(31) 



^ Cst(x) > 0, (32) 
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where we use 6 G 8^ it^ the penultimate inequahty. This completes the proof. □ 

If Ml = M2, let us define the set of equilibria Fsig as follows: x G Fsig if and only 
if there exists a bijective map from Si to ^2 such that 



M- 



Xij . ^ 



otherwise. 



Equilibria in Fsig correspond to a perfectly efficient signaling system, in the sense 
that they bear no synonyms or informational bottlenecks. It is easy to check from the 



calculation in the proof of (b) of Proposition 5.2 that the Jacobian matrix at such an 



X has the only eigenvalue —H{x), which implies asymptotic stability. 

Corollary 5.1. Suppose the game has the same number of states and signals, i.e. 
Ml = M2 and that x G Fsig. Then x is asymptotically stable for the mean-field ODE. 



6 Links with static equilibrium analysis 

We present in this section some results on the characterization of evolutionarily 
stable strategies (ESS) and neutrally stable strategies (NSS) for signaling games. Note 
that Taylor and Jonker (1978), and Zeeman (1979) propose, in a general setting, con- 
ditions under which an evolutionarily stable strategy is indeed a (dynamically) stable 
equilibrium. However the condition is quite strong, and only applies to special cases of 
the signaling game. 

We show here an equivalence between this static context and the underlying rein- 
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forcement learning dynamics, i.e. that 



(1) the set of ESS matches the set of asymptotically stable equilibria of the mean-field 
ODE; 

(2) the set of neutrally stable strategies matches the set of linearly stable equilibria of 
the mean-field ODE; 

(3) the set of Nash equilibria matches the set A where VH ■ F vanishes. 



In Sections |6.1[ |6.2| and |6.3| we successively present usual notions in the static setting of 
signaling games, results on equilibrium selection and the connection between stability 
of the dynamics and the static setting of the game mentioned above. 



6.1 Static setting 

Let 

P = {P G Mf : Vi G 5i, p,j = 1 }, 
Q = {gGMf><^-VjG52, = 1}. 

A Sender's strategy can be represented by a Mi x M2 matrix P G P in the sense 
that if he sees state i, he chooses signal j with probability pij. Also we can represent 
Receiver's strategy by a M2 x Mi matrix Q E Q, i.e. if he sees signal j he chooses act 
i with probability qji. The payoff function is 

Payoff (P,Q)= Yl Pim = tr{PQ)- 
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As in the real world, where somebody can be sender at one time and receiver at 
another, the authors assume that each individual can be Sender or Receiver with equal 
probability at each time, which symmetrizes the game. Thus the (mixed) strategy 
of any of the two players of the symmetrized game is a pair of Sender and Receiver 
matrices (P, Q), and the payoff function T can be written as 

T[(P, Q), (P', Q')] = T[(P', Q'), (P, Q)] = ^tr(Pg') + ^tr(P'g). 
6.2 Equilibrium selection 

Definition 6.1. A strategy {P,Q) x Q is called a Nash strategy if 
T[(P, Q), (P, Q)] ^ T[(P', Q'), (P, Q)], V (P', Q')eVx Q. 



There are uncountably many Nash strategies in signaling games. Let us recall the 
following notions of evolutionary and neutrally stable equilibria, which enable one to 
distinguish the relevant limiting strategies of the game; note that the notions are purely 
static here. 

Definition 6.2. A strategy {P,Q) x Q is evolutionarily stable if 

(i) it is a Nash strategy, and 

(ii) T[(P, g), (P', Q')] > T[(P', go, (P', Q')] for all (P', Q') + (P, Q). 

Definition 6.3. A strategy {P,Q) &V x Q is neutrally stable if 

(i) it is a Nash strategy, and 
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(ii) tf T[(P, Q), (P, Q)] = T[(P', Q'), (P, Q)] for some (P', Q')eVx Q, then 



T[(p,g),(p',g')]^T[(p',g'),(^',Q')]- 



The following Propositions 6.1, 6.2 and 6.3 characterize ESSs and NSSs in the 
signaling game. 

Proposition 6.1 (Trapa and Nowak (2000)). Let {P,Q) eV x Q be such that neither 
P nor Q contains a column that consists entirely of zeros. Then (P, Q) is a Nash 
strategy if and only if there exist positive numbers pi, . . . ,pn and qi, . . . ,qm such that 

(1) for each j, the j-th column of P has its entries drawn from {0,pj}, 

(2) for each i, the i-th column of Q has its entries drawn from {0, q^} 

(3) for all i, j, p^ ^ if and only if qji ^ 0. 



Remark. The assumption that neither P nor Q contains a column consisting entirely 
of zeros corresponds to the requirement that no signal or act falls out of use. 

Proposition 6.2 (Pawlowitsch (2007)). Let {P,Q) eV x Q be a Nash strategy. Then 
(P, Q) is a neutrally stable strategy if and only if 

(1) at least one of the two matrices, P or Q, has no zero column, and 

(2) if P or Q contains a column with more than one positive element, then 
all the elements in this column take values in {0,1}. 



Proposition 6.3 (Trapa and Nowak (2000)). {P,Q) E V x Q is an evolutionarily 
stable strategy if and only if Mi = M2, P is a permutation matrix and Q = P^ . 
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6.3 Connection 

Let us now define a map between the static and dynamic learning models, in order 
to emphasize the correspondence between stability in either settings. 

Let si be a bijective map from Si to {1, ... , Mi} and S2 be a bijective map from 
«S2 to {1, ... , M2}. Let us define map * from A \ to P x Q : = (P, Q) where 



Psi{i)s2{j) — 



) Qs2{j)si{i) 



Vi e Si, j e S2- 



Proposition 6.4. Let 




{P,Q) &V X Q : neither P nor Q contains 



any column that consists entirely of zeros 



and let 



■VxQ ■= {{P, Q) eV X Q is a Nash strategy } n L, 



J^VxQ '■— {{P, Q) & V X Q is a neutrally stable strategy } n F, 



^VxQ '■= {{P, Q) & V X Q is an evolutionarily stable strategy } n F. 



Then 



(a) ^((A \ (9A) n A) = CvxQ and ^((A \ ^A) \ A)) n £pxQ = 0- 



(b) *(F,)=£p,2. 



(C) ^(F,ig) =£pxQ. 
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Proof, (a) is a direct consequence of Lemma 4.3 and Proposition 6.1 



Conversely, given {P,Q) E C-pxQ, let pi, ... Pn and gi, ... qm be defined as in 



Proposition 6.1 Note that 



Psi{i)s2(j)Qs2{j)si{i) Ps2{j)Qsi{i) '^Psj^{i)s2(j)¥'0 Psi{i)s2(j)Qsi{i) Ps2{j)Qs2{j)si{i)- 



Define 



Z := ^ Psi{i)s2{j)(ls2U)siii) = ^qsiii) = ^P. 



'S2(j)5 



and let 



Xij :- 



Xj . 



Xj .- 



Psi{i)s2{j)Qs2U)si{i) 



i G Si J e ^2 



"^Xij, jeS2. 



Pj 



T.k€S2 Pk 



Then (P, Q) = "^{x), and a; G A \ (9A) fl A is again a direct consequence of Lemma 



4.3 and Proposition 6.1 



Let us now prove (6), and assume {P,Q) = \E'(x): if one column of P (resp. Q) 
has more than one element, this corresponds to an informational bottleneck (resp. 



synonym). Hence Condition (2) in Proposition 6.2 means that a state(act)-signal cor- 



respondence cannot be associated both to a synonym and an informational bottleneck, 
which translates into {P)g^. (c) follows from Corollary 5.1 and Proposition 6.3 □ 
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7 Convergence with positive probability to stable 
configurations 



Let us prove the more general 

Theorem 7.1. Let q ETg, and letj\f{q) neighbourhood of q in A. Then, with positive 
probability, 

(a) Xn-^ X e M{q) s.t. = Gq. 

(b) 1^(00,^,^') = 00 <^=^ {hi} is an edge of Q . 



Theorem 7.1 is an obvious consequence of the following Proposition 7.1 Given 
Q = (iS, ~), assume that {P)g holds: then each connected component of Q contains 
either only a single state or a single signal. Let it := ttq : S — TZ^ be the function 
mapping i E S to the single state/signal in the same connected component as i, with 
the convention that we choose the state if the component consists only of one state and 
one signal. 



For alH G iS, n G N and e > 0, let us define 



._ „n / n 

K:= n {Vr^2en}, 

Hl:= n {V-^V^}. 

Proposition 7.1. LetQ be such that {P)g holds, and let n := ng . For all e e (0,1/Mi), 
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if H^, and hold, and n ^ Cst(e, Mi, M2), then, with lower hounded probability 
(only depending on e. Mi and M2), for all i, j & S , k ^ n, 

= V:;, when nil) ^ 7r(j); (33) 
af/«re(l-e,l + e); (34) 
^ ek, when 7r(i) = i. (35) 



In the remainder of this section, we fix the graph [Q, ~) (and thus it = ttq) and 



e > 0. The proof consists of the following Lemmas 7.1 7.3 



Let, for all i, j E S, n E N, 



t'/'^ := M{k ^n:V'^ V^}- 



2,i 



3,i ._ 



inf{fc ^ n ■ Ci\la1 ^ (1 - e, 1 + e)}; 
inf{/c ~^n:V^< ek}, 



and let 



inf 7-i'*J r^-=infr^'* r 



3 ._ 
n ■ 



inf T^'\ Tn:=T^AT^ATl 
«£o ,ir(i)=i 



Lemma 7.1. If n ^ Cst(e, Mi, M2), then 
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Proof. Assuming n ^ Cst(e, Mi, M2), 



2 ^ e4A;2 

7r(i)^7r(j),fe^n 



^ exp (-2e-^MiM2) 



Lemma 7.2. //n ^ Cst(e, Mi, M2) t/ien, /or a// i G 5, 

P( T„^'^ > T> I j;, ^ ^ 1 _ 2 exp(-Cst(e)n). 

Proof. Fix i e 5, n G N, and assume w.l.o.g. that 7r(i) 7^ i. 
Let, for all j E S and k ^ n, 

which is equal to V^f' as long as A; < r^. 
Let, for all k ^ n, 

W^fe:=log ' 



and let us consider the Doob decomposition of (VFfe)fe^n: 



A,:= ^ E(W^,--W^,_i|.F,_i) 
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□ 



Assume that H^, and hold, and that k < Tn. then 



1 / ^i7r(j) 

WA~vf} K 



^(l + n((v;r^)-— 



7r(i) 



A;-3/2n(Cst(e,Mi,M2)) 



where we use that, for all j ~ vr(i). 



1 l/*^ 

J- X ^ 7r(«)j 7r(t)j 



j~7r(i) 



l + k-^/^U{Qst{e,Mi,M2) - 



j~7r{i) 



7r(j) 



7r(i)j 



yk 



E 



(36) 



Therefore, for all k ^ n, 



\Ak\ ^n-'/^Cst{e,Mi,M2 



Let us now estimate the martingale increment: |\E'fc+i — "^kl ^ Cst(e)fc ^ (since 
— Wk\ ^ Cst(e)/c~^), so that Lemma 7.4 implies 



P ( sup l^fe - ^ e/2 ) ^ 1 - 2exp(-Cst(e)n) 



which completes the proof. 



□ 



Lemma 7.3. If e E (0,1/Mi) and n ^ Cst(e, Mi,M2) then, for all i G S such that 
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7i{i) = i, 



^{r^' > A T^J'r^HlHl^ > 1 - 2 exp(-Cst(e)n). 



Proof. Let n eN, assume that iJ„ and hold, and fix i G 5 such that 7r(z) = i. 



Let us consider the Doob decomposition of {Vj^ 



k -T ^k 



Now, for all r] > 0, ii n ^ Cst(?7,e) and k < Tn, (36) implies 



^k,.-^k = E{vr-v^\:F,)^^Y.y 



(y^)\ 1 



Ml ^ IZ/IA'^ " Ml 



Let us now estimate the martingale increment: let, for all p ^ n, 



p-i ^ 



k=n 



Then, for all p ^ n, 



Y iXk+i - Xk)k = - Y Xk + {p- l)Xp- 

nsjfc^p— 1 n^fc^p— 1 



This implies, using Lemma 7.4 (and |Sjfc+i — S^l ^ 1 for all k ^ n) that, for all 



e > 0, 



P (Vfc ^ n, V;'^ ^ (2e - r])^ + (A; - n)(l/Mi - r]) \ J^n) 
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I sup 




< V 1 J^n^ 


^ P (sup IXifcl 


/A 




P 






Z 



we choose rj = min(e/2, 1/Mi — e), which completes the proof. 



□ 



Lemma 7.4. Let {'yk)keN be a deterministic sequence of positive reals, let G := {Gn)neN 
be a filtration, and let (M„)„gN be a G-adapted martingale such that |M„+i — M„| ^ 7„ 
for alln eN. Then, for all n eN, 

P (sup(Mfc - M„) ^ A I Gn) ^ exp (-7^ 2 

Proof. Let, for all ^ e R, 

z^{e):=e^v{eM^-\Y.^l 

Then {Zn{9))n^^ is supermartingale, so that 



k=l 



P sup(Mfc - M„) ^ A I e?. 



k^n 



^ P sup Zk{9) ^ Z„{e) exp A^ - - ^ \ 



k^n 



< exp ^ Y ^ 7fc - j = exp ^- 



fe=i 
A^ 



2 Ylik^n 7fe 



if we choose 9 := A/ J^^^^ 7^. 



□ 
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